This notebook describes setting up a model inspired by the Maxout Network (Goodfellow et al.) which they ran out the CIFAR-10 dataset.

The yaml file was modified as little as possible, substituting variables for settings and dataset paths, getting rid of their data pre-processing, and changing the number of channels.


In [ ]:
!obj:pylearn2.train.Train {
    dataset: &train !obj:neukrill_net.dense_dataset.DensePNGDataset {
            settings_path: %(settings_path)s,
            run_settings: %(run_settings_path)s,
            training_set_mode: "train"
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: &batch_size 128,
        layers: [
                 !obj:pylearn2.models.maxout.MaxoutConvC01B {
                     layer_name: 'h0',
                     pad: 4,
                     tied_b: 1,
                     W_lr_scale: .05,
                     b_lr_scale: .05,
                     num_channels: 96,
                     num_pieces: 2,
                     kernel_shape: [8, 8],
                     pool_shape: [4, 4],
                     pool_stride: [2, 2],
                     irange: .005,
                     max_kernel_norm: .9,
                     partial_sum: 33,
                 },
                 !obj:pylearn2.models.maxout.MaxoutConvC01B {
                     layer_name: 'h1',
                     pad: 3,
                     tied_b: 1,
                     W_lr_scale: .05,
                     b_lr_scale: .05,
                     num_channels: 192,
                     num_pieces: 2,
                     kernel_shape: [8, 8],
                     pool_shape: [4, 4],
                     pool_stride: [2, 2],
                     irange: .005,
                     max_kernel_norm: 1.9365,
                     partial_sum: 15,
                 },
                 !obj:pylearn2.models.maxout.MaxoutConvC01B {
                     pad: 3,
                     layer_name: 'h2',
                     tied_b: 1,
                     W_lr_scale: .05,
                     b_lr_scale: .05,
                     num_channels: 192,
                     num_pieces: 2,
                     kernel_shape: [5, 5],
                     pool_shape: [2, 2],
                     pool_stride: [2, 2],
                     irange: .005,
                     max_kernel_norm: 1.9365,
                 },
                 !obj:pylearn2.models.maxout.Maxout {
                    layer_name: 'h3',
                    irange: .005,
                    num_units: 500,
                    num_pieces: 5,
                    max_col_norm: 1.9
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: %(n_classes)i,
                     irange: .005
                 }
                ],
        input_space: !obj:pylearn2.space.Conv2DSpace {
            shape: &window_shape [32, 32],
            num_channels: 3,
            axes: ['c', 0, 1, 'b'],
        },
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .17,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5
            },
        train_iteration_mode: 'even_shuffled_sequential',
        monitor_iteration_mode: 'even_sequential',
        monitoring_dataset:
            {
                'test' : !obj:neukrill_net.dense_dataset.DensePNGDataset  {
                                settings_path: %(settings_path)s,
                                run_settings: %(run_settings_path)s,
                                training_set_mode: "test"
                },
            },
        cost: !obj:pylearn2.costs.mlp.dropout.Dropout {
            input_include_probs: { 'h0' : .8 },
            input_scales: { 'h0' : 1. }
        },
        termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 474 
        },
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: .65
        },
        !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
            start: 1,
            saturate: 500,
            decay_factor: .01
        },
        !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
            channel_name: test_y_misclass,
            save_path: '%(save_path)s'
        },
    ],
}

Before we can start training the model, we need to create a dictionary with all preprocessing settings and poimt to the yaml file corresponding to the model: cifar10.yaml.


In [2]:
run_settings = {
    "model type":"pylearn2",
    "yaml file": "cifar10.yaml",
    "preprocessing":{"resize":[48,48]},
    "final_shape":[48,48],
    "augmentation_factor":1,
    "train_split": 0.8
}

To set up the path for settings, utils and os must be imported.


In [12]:
import neukrill_net.utils
import os
reload(neukrill_net.utils)


Out[12]:
<module 'neukrill_net.utils' from '/afs/inf.ed.ac.uk/user/s13/s1320903/Neuroglycerin/neukrill-net-tools/neukrill_net/utils.pyc'>

In [ ]:
cd ..

In [8]:
run_settings["run_settings_path"] = os.path.abspath("run_settings/cifar10_based.json")

In [15]:
run_settings


Out[15]:
{'augmentation_factor': 1,
 'final_shape': [48, 48],
 'model type': 'pylearn2',
 'preprocessing': {'resize': [48, 48]},
 'run_settings_path': '/afs/inf.ed.ac.uk/user/s13/s1320903/Neuroglycerin/neukrill-net-work/run_settings/cifar10_based.json',
 'train_split': 0.8,
 'yaml file': 'cifar10.yaml'}

Now the settings can be saved.


In [16]:
neukrill_net.utils.save_run_settings(run_settings)

In [17]:
!cat run_settings/cifar10_based.json


{
    "augmentation_factor":1,
    "final_shape":[
        48,
        48
    ],
    "model type":"pylearn2",
    "preprocessing":{
        "resize":[
            48,
            48
        ]
    },
    "run_settings_path":"/afs/inf.ed.ac.uk/user/s13/s1320903/Neuroglycerin/neukrill-net-work/run_settings/cifar10_based.json",
    "train_split":0.8,
    "yaml file":"cifar10.yaml"
}

Now we can start training the model with:


In [ ]:
python train.py run_settings/cifar10_based.json

After trying to run the model, it broke with an error that partialSum does not divide numModules. Turns out that partialSum is a parameter of a convolutional layer that affects the performance of the weight gradient computation and it has to divide the area of the output grid in this layer, which is given by numModules. Conveniently, the error gave the values of numModules (which are not specified in the yaml file) so we just changed partialSum in each layer to a factor of the corresponding numModules.


In [ ]:
!obj:pylearn2.train.Train {
    dataset: &train !obj:neukrill_net.dense_dataset.DensePNGDataset {
            settings_path: %(settings_path)s,
            run_settings: %(run_settings_path)s,
            training_set_mode: "train"
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: &batch_size 128,
        layers: [
                 !obj:pylearn2.models.maxout.MaxoutConvC01B {
                     layer_name: 'h0',
                     pad: 4,
                     tied_b: 1,
                     W_lr_scale: .05,
                     b_lr_scale: .05,
                     num_channels: 96,
                     num_pieces: 2,
                     kernel_shape: [8, 8],
                     pool_shape: [4, 4],
                     pool_stride: [2, 2],
                     irange: .005,
                     max_kernel_norm: .9,
                     partial_sum: 49,
                 },
                 !obj:pylearn2.models.maxout.MaxoutConvC01B {
                     layer_name: 'h1',
                     pad: 3,
                     tied_b: 1,
                     W_lr_scale: .05,
                     b_lr_scale: .05,
                     num_channels: 192,
                     num_pieces: 2,
                     kernel_shape: [8, 8],
                     pool_shape: [4, 4],
                     pool_stride: [2, 2],
                     irange: .005,
                     max_kernel_norm: 1.9365,
                     partial_sum: 23,
                 },
                 !obj:pylearn2.models.maxout.MaxoutConvC01B {
                     pad: 3,
                     layer_name: 'h2',
                     tied_b: 1,
                     W_lr_scale: .05,
                     b_lr_scale: .05,
                     num_channels: 192,
                     num_pieces: 2,
                     kernel_shape: [5, 5],
                     pool_shape: [2, 2],
                     pool_stride: [2, 2],
                     irange: .005,
                     max_kernel_norm: 1.9365,
                 },
                 !obj:pylearn2.models.maxout.Maxout {
                    layer_name: 'h3',
                    irange: .005,
                    num_units: 500,
                    num_pieces: 5,
                    max_col_norm: 1.9
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: %(n_classes)i,
                     irange: .005
                 }
                ],
        input_space: !obj:pylearn2.space.Conv2DSpace {
            shape: &window_shape [32, 32],
            num_channels: 3,
            axes: ['c', 0, 1, 'b'],
        },
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .17,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5
            },
        train_iteration_mode: 'even_shuffled_sequential',
        monitor_iteration_mode: 'even_sequential',
        monitoring_dataset:
            {
                'test' : !obj:neukrill_net.dense_dataset.DensePNGDataset  {
                                settings_path: %(settings_path)s,
                                run_settings: %(run_settings_path)s,
                                training_set_mode: "test"
                },
            },
        cost: !obj:pylearn2.costs.mlp.dropout.Dropout {
            input_include_probs: { 'h0' : .8 },
            input_scales: { 'h0' : 1. }
        },
        termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 474 
        },
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: .65
        },
        !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
            start: 1,
            saturate: 500,
            decay_factor: .01
        },
        !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
            channel_name: test_y_misclass,
            save_path: '%(save_path)s'
        },
    ],
}

The results were not very good (nll = ~3) so we are not going to continue working on this model.